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Abstract 

Gibbs sampling also known as Glauber dynamics is a popular technique for sampling high dimen- 
sional distributions defined on graphs. Of special interest is the behavior of Gibbs sampling on the 
Erdos-Renyi random graph G(n, d/n), where each edge is chosen independently with probability d/n 
and d is fixed. While the average degree in G(n, d/n) is d(l — o(l)), it contains many nodes of degree 
of order log n / log log n. 

The existence of nodes of almost logarithmic degrees implies that for many natural distributions 
defined on G(n,p) such as uniform coloring (with a constant number of colors) or the Ising model at 
any fixed inverse temperature (3, the mixing time of Gibbs sampling is at least 7j 1 + ( 1 / lo g 1 °g"). Recall 
that the Ising model with inverse temperature (3 defined on a graph G = (V, E) is the distribution over 
{±} y given by P(cr) = exp(/3 J2(v u)eE c(v)cr(u)). High degree nodes pose a technical challenge 
in proving polynomial time mixing of the dynamics for many models including the Ising model and 
coloring. Almost all known sufficient conditions in terms of (3 or number of colors needed for rapid 
mixing of Gibbs samplers are stated in terms of the maximum degree of the underlying graph. 

In this work we show that for every d < oo and the Ising model defined on G(n, d/n), there exists 
a (3d > 0, such that for all (3 < (3d with probability going to 1 as n — ► oo, the mixing time of the 
dynamics on G(n,d/n) is polynomial in n. Our results are the first polynomial time mixing results 
proven for a natural model on G(n, d/n) for d > 1 where the parameters of the model do not depend on 
n. They also provide a rare example where one can prove a polynomial time mixing of Gibbs sampler in 
a situation where the actual mixing time is slower than npolylog(n). Our proof exploits in novel ways 
the local treelike structure of Erdos-Renyi random graphs, comparison and block dynamics arguments 
and a recent result of Weitz. 

Our results extend to much more general families of graphs which are sparse in some average sense 
and to much more general interactions. In particular, they apply to any graph for which every vertex v 
of the graph has a neighborhood N(v) of radius 0(log n) in which the induced sub-graph is a tree union 
at most O(logn) edges and where for each simple path in N(v) the sum of the vertex degrees along 
the path is 0(log n). Moreover, our result apply also in the case of arbitrary external fields and provide 
the first FPRAS for sampling the Ising distribution in this case. We finally present a non Markov Chain 
algorithm for sampling the distribution which is effective for a wider range of parameters. In particular, 
for G(n, d/n) it applies for all external fields and (3 < (3d, where dtanh(/?d) = 1 is the critical point for 
decay of correlation for the Ising model on G(n, d/n). 
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1 Introduction 



Efficient approximate sampling from Gibbs distributions is a central challenge of randomized algorithms. 
Examples include sampling from the uniform distribution over independent sets of a graph I27ll26l l6ll8l. 
sampling from the uniform distribution of perfect matchings in a graph ifTTl . or sampling from the uni- 
form distribution of colorings lfT2l 01 of a graph. A natural family of approximate sampling techniques 
is given by Gibbs samplers, also known as Glauber dynamics. These are reversible Markov chains that 
have the desired distribution as their stationary distribution and where at each step the status of one vertex 
is updated. It is typically easy to establish that the chains will eventually converge to the desired distribution. 

Studying the convergence rate of the dynamics is interesting from both the theoretical computer science 
the statistical physics perspectives. Approximate convergence in time polynomial in the size of the system, 
sometimes called rapid mixing, is essential in computer science applications. The convergence rate is also 
of natural interest in physics where the dynamical properties of such distributions are extensively studied, 
see e.g. |[20l . Much recent work has been devoted to determining sufficient and necessary conditions for 
rapid convergence of Gibbs samplers. A common feature to most of this work |[27T l26l l6l [8l IT2l l4l H~8l l22ll 
is that the conditions for convergence are stated in terms of the maximal degree of the underlying graph. In 
particular, these results do not allow for the analysis of the mixing rate of Gibbs samplers on the Erdos-Renyi 
random graph, which is sparse on average, but has rare denser sub-graphs. Recent work has been directed 
at showing how to relax statements so that they do not involve maximal degrees ||5l[T3l, but the results are 
not strong enough to imply rapid mixing of Gibbs sampling for the Ising model on G(n,d/n) for d > 1 
and any f3 > or for sampling uniform colorings from G(n, d/n) for d > 1 and 1000<i colors. The second 
challenge is presented as the major open problem of @. 

In this paper we give the first rapid convergence result of Gibbs samplers for the Ising model on Erdos- 
Renyi random graphs in terms of the average degree and f3 only. Our results hold for the Ising model 
allowing different interactions and arbitrary external fields. We note that there is an FPRAS that samples 
from the Ising model on any graph lfT6l as long as all the interactions are positive and the external field is 
the same for all vertices. However, these results do not provide a FPRAS in the case where different nodes 
have different external fields as we do here. 

Our results are further extended to much more general families of graphs that are "tree-like" and "sparse on 
average". These are graph where every vertex has a radius O(logra) neighborhood which is a tree with at 
most 0(log n) edges added and where for each simple path in the neighborhood, the sum of degrees along 
the path is O(logn). An important open problem Q is to establish similar conditions for other models 
defined on graphs, such as the uniform distribution over colorings. 

Below we define the Ising model and Gibbs samplers and state our main result. Some related work and a 
sketch of the proof are also given as the introduction. Section [2] gives a more detailed proof though we have 
not tried to optimize any of the parameters in proofs below. 

1.1 The Ising Model 

The Ising model is perhaps the simplest model defined on graphs. This model defines a distribution on 
labelings of the vertices of the graph by + and — . The Ising model has various natural generalizations 
including the uniform distribution over colorings. The Ising model with varying parameters is of use in a 
variety of areas of machine learning, most notably in vision, see e.g. Q. 

Definition 1.1 The (homogeneous) Ising model on a (weighted) graph G with inverse temperature (3 is a 
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distribution on configurations {±} such that 



P(o-) = —eMP E *( V M«)) (1) 
^ ' {v,u}eE 

where Z{(3) is a normalizing constant. 

More generally, we will be interested in (inhomogeneous) Ising models defined by: 

P ^ = ZTff) 6XP ^ ^ Pu,v(?{v)o-{u) +^h v a(v)), (2) 

where h v are arbitrary and where (3 U)V > for all u and v. In the more general case we will write 
(3 — mcix^^ Pu,v- 



1.2 Gibbs Sampling 

The Gibbs sampler is a Markov chain on configurations where a configuration a is updated by choosing a 
vertex v uniformly at random and assigning it a spin according to the Gibbs distribution conditional on the 
spins on G — {v}. 

Definition 1.2 Given a graph G = (V, E) and an inverse temperature [3, the Gibbs sampler is the discrete 
time Markov chain on {±}^ where given the current configuration a the next configuration a' is obtained 
by choosing a vertex v in V uniformly at random and 

• Letting a'(w) = a(w)for all w ^ v. 

• o~'(v) is assigned the spin + with probability 

exp(h v + Y.u:{v,u)&E Pu,vO-(u)) + exp(-/i Pu,vcr(u))' 

We will be interested in the time it takes the dynamics to get close to the distributions CD) and ©. The 
mixing time r m i x of the chain is defined as the number of steps needed in order to guarantee that the chain, 
starting from an arbitrary state, is within total variation distance 1 /2e from the stationary distribution. We 
will bound the mixing time by the relaxation time defined below. 



It is well known that Gibbs sampling is a reversible Markov chain with stationary distribution P. Let 
1 = Ai > A2 > • • • > A m > — 1 denote the eigenvalues of the transition matrix of Gibbs sampling. The 
spectral gap is denoted by max{l — A2, 1 — |A m |} and the relaxation time r is the inverse of the spectral 
gap. The relaxation time can be given in terms of the Dirichlet form of the Markov chain by the equation 

where / : {±}^ — > K is any function on configurations, Q(a,r) = P(a)P(a — > r) and P(a — > r) is 
transition probability from a to r. We use the result that for reversible Markov chains the relaxation time 
satisfies 



r < r mix < t ^1 + ^ log(min P{a)) *j 



(4) 
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where T m i x is the mixing time (see e.g. [|2]]) and so by bounding the relaxation time we can bound the mixing 
time up to a polynomial factor. 

While our results are given for the discrete time Gibbs Sampler described above, it will at times be conve- 
nient to consider the continuos time version of the model. Here sites are updated at rate 1 by independent 
Poisson clocks. The two chains are closely related, the relaxation time of the continuous time Markov chain 
is n times the relaxation time of the discrete chain (see e.g. |[2l). 

For our proofs it will be useful to use the notion of block dynamics. The Gibbs sampler can be generalized to 
update blocks of vertices rather than individual vertices. For blocks Vi, V2, ■ ■ ■ , C V with V = UiVi the 
block dynamics of the Gibbs sampler updates a configuration a by choosing a block Vi uniformly at random 
and assigning the spins in Vi according to the Gibbs distribution conditional on the spins on G— {Vi}. There 
is also a continuous analog in which the blocks each update at rate 1. In continuous time, the relaxation time 
of the Gibbs sampler can be given in terms of the relaxation time of the block dynamics and the relaxation 
times of the Gibbs sampler on the blocks. 

Proposition 1.3 In continuous time ifruock is the relaxation time of the block dynamics and Ti is the max- 
imum the relaxation time on Vi given any boundary condition from G — {Vi} then by Proposition 3.4 of 

mi 

t < mock (max n) max{#j : v G Vj}. (5) 

1.2.1 Monotone Coupling 

For two configurations X, Y € {— , +} v we let X fc= Y denote that X is greater than or equal to Y 
pointwise. When all the interactions are positive, it is well known that the Ising model is a monotone 
system under this partial ordering, that is if X !>= Y then, 

P \Ov = +I°V\M = X V\{v}) > P {°~v = +Wv\{v} = Y V\{v}) ■ 

As it is a monotne system there exists a coupling of Markov chains {Xf} xe t >+ yv such that marginally 
each has the law of the Gibbs Sampler with starting configurations X§ = X and further that if x ^ y then 
for all t, Xf )p X\. This is referred to as the monotone coupling and can be constructed as follows: let 
fi, ... be a random sequence of vertices updated by the Gibbs Sampler and associate with them iid random 
variables U\,... distributed as U[0, 1] which determine how the site is updated. At the ith update the site vi 
is updated to + if 

exp(h v + E«:( t ,,«) G E Pu,vO-(u)) + exp(-h v - T,w.(v,u)eE Pu,vO-(u)) 

and to — otherwise. It is well known that such transitions preserve the partial ordering which guarantees 
that if x y then Xf ^ X\ by the monotonicity of the system. In particular this implies that it is enough 
to bounded the time taken to couple from the all + and all — starting configurations. 

1.3 Erdos-Renyi Random Graphs and Other Models of graphs 

The Erdos-Renyi random graph G(n,p), is the graph with n vertices V and random edges E where each 
potential edge (u,v) G V x V is chosen independently with probability p. We take p = d/n where d > 1 
is fixed. In the case d < 1, it is well known that with high probability all components of G(n,p) are of 
logarithmic size which implies immediately that the dynamics mix in polynomial time for all (3. 
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For a vertex v in G(n, d/n) let V(v, I) = {u € G : d(u, v) < I}, the set of vertices within distance / of v, 
let S(v, I) = {u G G : d(u, v) = I}, let E(v, I) = {(it, w) e G : u,w e V(v, I)} and let B(v, I) be the 
graph (V(v,l),E(v,l)). 

Our results only require some simple features of the neighborhoods of all vertices in the graph. 
Definition 1.4 Let G = (V, E) be a graph and v a vertex in G. Let t(G) denote the tree access of G, i.e., 

t(G) = \E\ - \V\ + 1. 

We call a path v i, t>2, . . . self avoiding if for all j it holds that V{ ^ vj. We let the maximal path density 
m be defined by 

m(G, v, I) = max d u 
«er 

where the maximum is taken over all self-avoiding paths T starting at v with length at most I and d u is the 
degree of node u. We write t(v, I) for t(B(v, I)) and m(v, I) for m(B(v, l),v, I). 

1.4 Our Results 

Throughout we will be using the term with high probability to mean with probability 1 — o(l) as n goes to 
oo. 

Theorem 1.5 Let G be a random graph distributed as G(n, d/n). When 



tanh(/3) < - T - 



there exists constant a C = C(d) such that the mixing time of the Glauber dynamics is 0(n c ) with high 
probability (probability 1 — o(l)J over the graph as n goes to oo. The result holds for the homogeneous 
model and for the inhomogeneous model (0 provided \h v \ < W0/3nfor all v. 

Note in the theorem above the O(-) bound depends on 0. It may be viewed as a special case of the following 
more general result. 

Theorem 1.6 Let G = (V,E) be any graph on n vertices satisfying the following properties. There exist 
a>0,0<6<oo and < c < oo such that for all v £7 it holds that 

t(v,alogn) < blogn, m(v,alogn) < clogn. 

Then if 



tanh(/3) < -, 

e l ' a (c — a) 

there exists constant a C = C(a,b, c, (3) such that the mixing time of the Glauber dynamics is 0(n c ). The 
result holds for the homogeneous model and for the inhomogeneous model (0 provided \h v \ < 100/3n 
for all v. 

Remark 1.7 The condition that \h v \ < 100 (3n for all v will be needed in the proof of the result in the 
general case (EJ). However, we note that given Theorem 17.61 as a black box, it is easy to extend the result 
and provide an efficient sampling algorithm in the general case without any bounds on the h v . In the case 
where some of the vertices v satisfy \h v \ > 10 /3n, it is easy to see that the target distribution satisfies except 
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with exponentially small probability that a v = + for all v with h v > 10(3n and o~ v = — for all v with 
h v < —lQ(3n. Thus we may set a v = + when h v > Wf3n and a v = — when h v < 10/3n and consider the 
dynamics where these values are fixed. Doing so will effectively restrict the dynamics to the graph spanned 
by the remaining vertices and will modify the values of h v for the remaining vertices; however, it is easy to 
see that all remaining vertices will have \h v \ < 100/3n. It is also easy to verify that if the original graph 
satisfied the hypothesis of Theorem \1.6\ then so does the restricted one. Therefore we obtain an efficient 
sampling procedure for the desired distribution. 

1.5 Related Work and Open Problems 

Much work has been focused on the problem of understanding the mixing time of the Ising model in various 
contexts. In a series of results lfT4l |Tll28l culminating in ||25l it was shown that the Gibbs sampler on integer 
lattice mixes rapidly when the model has the strong spatial mixing property. In 1? strong spatial mixing, and 
therefore rapid mixing, holds in the entire uniqueness regime (see e.g. [ED). On the regular tree the mixing 
time is always polynomial but is only 0(n log n) up to the threshold for extremity Q. For completely 
general graphs the best known results are given by the Dobrushin condition which establishes rapid mixing 
when <itanh(/3) < 1 where d is the maximum degree. 

Most results for mixing rates of Gibbs samplers are stated in terms of the maximal degree. For example 
many results have focused on sampling uniform colorings, the result are of the form: for every graph where 
all degrees are at most d if the number of colors q satisfies q > q(d) then Gibbs sampling is rapidly 
mixing fl27l |26l [6] [HI H21 SI |T8l [22]|. For example, Jerrum [Bl showed that one can take q(d) = 2d. 
The novelty of the result presented here is that it allows for the study of graphs where the average degree is 
small while some degrees may be large. 

Previous attempts at studying this problem, with bounded average degree but some large degrees, for sam- 
pling uniform colorings yielded weaker results. In ||5l it is shown that Gibbs sampling rapidly mixes 
on G(n,d/n) if q = f^((log n) a ) where a < 1 and that a variant of the algorithm rapidly mixes if 
q > ^d(log log nj log log log n). Indeed the main open problem of Q is to determine if one can take q to 
be a function of d only. Our results here provide a positive answer to the analogous question for the Ising 
model. We further note that other results where the conditions on degree are relaxed |[T3l do not apply in 
our setting. 

The following propositions, which are easy and well known, establish that for d > 1 and large the mixing 
time is exponential in n and that for all d > and (3 > the mixing time is more than npolylog(n). 

Proposition 1.8 If d > and (3 > then with high probability the mixing time of the dynamics on 
G{n, d/n) is at least n i+^(i/l°glogn)_ 

Proof: The proof follows from the fact that G(n, d/n) contains an isolated star with s = Sl(log n/ log log n) 
vertices with high probability and that the mixing time of the star is s exp(Sl(s)). Since the star is updated 
with frequency s/n, it follows that the mixing time is at least 

(n/s)sexp(fi(8)) = nexp(tt(s)) = n i+n(i/iogiogrO 



Proposition 1.9 If d > 1 then there exists (3' d such that if (3 > j3' d then the with probability going to 1, the 
mixing time of the dynamics on G(n, d/n) is exp(J7(n)). 
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Proof: The claim follows from expansion properties of G(n, d/n). It is well known that if d > 1 then with 
high probability G(n,d/n) contains a core C of size at least a d n such that that every S C C of size at 
least a d /4n has at least ^y d n edges between C and S\C. Let A be the set of configurations a such that a 
restricted to C has at least a d /4 +'s and at least a d /4 -'s. Then P(A) < 2 n exp(f3\E\ - 2f3j d n) /Z. On the 
other hand if + denotes the all + state then P(+) = P(— ) = exp(j3\E\)/Z. Thus by standard conductance 
arguments, the mixing time is exponential in n when 2 exp(— 2f3j d ) < 1. ■ 

It is natural to conjecture that properties of the Ising model on the branching process with Poisson(d) 
offspring distribution determines the mixing time of the dynamics on G(n, d/n). In particular, it is natural 
to conjecture that the critical point for uniqueness of Gibbs measures plays a fundamental role iflOl l24l 
as results of similar flavor were recently obtained for the hard-core model on random bi-partite d regular 
graphs ll23l . 

Conjecture 1.10 If dtanh((3) > 1 then with high probability over G(n, d/n) the mixing time of the Gibbs 
sampler is exp(0(n)). If d > 1 and dtanh(/3) < 1 then with high probability over G(n, d/n) the mixing 
time of the Gibbs sampler is polynomial in n. 

After proposing the conjecture we have recently learned that Antoine Gerschenfeld and Andrea Montanari 
have found an elegant proof for estimating the partition function (that is the normalizing constant Z((3)) 
for the Ising model on random d-regular graphs ifTTTl . Their result together with a standard conductance 
argument shows exponentially slow mixing above the uniqueness threshold which in the context of random 
regular graphs is(d + 1) tanh(/3) = 1. 

1.6 Proof Technique 

Our proof follows the following main steps. 

• Analysis of the mixing time for Gibbs sampling on trees of varying degrees. We find a bound on the 
mixing time on trees in terms of the maximal sum of degrees along any simple path from the root. 
This implies that for all (3 if we consider a tree where each node has number of descendants that 
has Poisson distribution with parameter d — 1 then with high probability the mixing time of Gibbs 
sampling on the tree is polynomial in its size. The motivation for this step is that we are looking at 
tree-like graphs Note however, that the results established here hold for all (3, while rapid mixing for 
G(n, d/n) does not hold for all f3. Our analysis here holds for all boundary conditions and all external 
fields on the tree. 

• We next use standard comparison arguments to extend the result above to case where the graph is 
a tree with a few edges added. Note that with high probability for all v € G(n, d/n) the induced 
subgraph B(v, \ log rf n) on all vertices of distance at most \ log d n from v is a tree with at most a few 
edges added. (Note this still holds for all (3). 

• We next consider the effect of the boundary on the root of the tree. We show that for tree of a log n 
levels, the total variation distance of the conditional distribution at the root given all + boundary 
conditions and all — boundary conditions is n" 1 "^ 1 ) with probability 1 — n -1 "^ 1 ) provided (3 < fid 
is sufficiently small (this is the only step where the fact that j3 is small is used). 

• Using the construction of Weitz ll27l and a Lemma from |[T8l we show that the spatial decay estab- 
lished in the previous step also holds with probability 1 — o(l) for all neighborhoods B(v, a log n) in 
the graph. 
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• The remaining steps use the fact that a strong enough decay of correlation inside blocks each of which 
is rapidly mixing implies that the dynamics on the full graph is rapidly mixing. This idea is taken 
from Q. 

• In order to show rapid mixing it suffices to exhibit a coupling of the dynamics starting at all + and all 
— that couples with probability at least 1/2 in polynomial time. We show that the monotone coupling 
(where the configuration started at — is always "below" the configuration started at +) satisfies this by 
showing that for each v in polynomial time the two configurations at v coupled except with probability 

n- l /{2e). 

• In order to establish the later fact, it suffices to show that running the dynamics on B(v,a\ogn) 
starting at all + and all + boundary conditions and the dynamics starting at all — and all — will 
couple at v except with probability n _1 /(2e) within polynomial time. 

• The final fact then follows from the fact that the dynamics inside B(v,alogn) have polynomial 
mixing time and that the stationary distributions in B(v, \ \og d n) given + and — boundary conditions 
agree at v with probability at least 1 — n" 1 / (4e). 

We note that the decay of correlation on the self-avoiding tree defined by Weitz that we prove here allows a 
different sampling scheme from the target distribution. Indeed, this decay of correlation implies that given 
any assignment to a subset of the vertices S and any v ^ S we may calculate using the Weitz tree of radius 
a log n in polynomial time the conditional probability that cr(v) = + up to an additive error 
is easy to see that this allow sampling the distribution in polynomial time. More specifically, consider the 
following algorithm from ll27l . 

Algorithm 1.11 Fix a radius parameter L and label the vertices v\, . . . , v n . Then the algorithm approxi- 
mately samples from P(o~) by assigning the spins ofvi sequentially. Repeating from 1 < i < n: 

• In step i construct Tg AW (vi), the tree of self-avoiding walks truncated at distance Lfrom Vi. 

• Calculate 

Pi = P T| AW Ki = +l >i,...,« l -i}> r A-V l _ 1 )- 

(The boundary conditions at the tree can be chosen arbitrarily; in particular, one may calculate pi 
with no boundary conditions). 

• Fix a Vi = X Vi where X Vi is a random variable with pi = P(X Vi = +) = 1 — P(X Vi = —). 
Then we prove that: 

Theorem 1.12 Let G be a random graph distributed as G(n, d/n). When 

tanh(/3) < i, 

for any 7 > there exist constants r = r(d, (3, 7) and C = C(d, (3, 7) such that with high proba- 
bility Algorithm with parameter rlogn, has running time 0(n c ) and output distribution Q with 
dTv~(P, Q) < n"^. The result holds for the homogeneous model (0) and for the inhomogeneous model (0. 



8 



Theorem 1.13 Let G = (V, E) be any graph on n vertices satisfying the following properties. There exist 
a > 0,0 < b < oo such that for all v G V, 

\V TsAw{v) (v,alogn)\<b al ^ n (6) 

where V TsAw ^(v,r) ={«£ T S aw(v) : d(u,v) < r}. When 

tanh(/3) < -, 

for any 7 > there exist constants r = r(a, b, f3, 7) and C = C(a, b, (3, 7) such that Alsorithm \l.ll\ with 
parameter r logn, has running time 0{n c ) and output distribution Q with dTv{P, Q) < n ~"' '■ The result 
holds for the homogeneous model and for the inhomogeneous model (0. 
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2 Proofs 

2.1 Relaxation time on Sparse and Galton Watson Trees 

Recall that the local neighborhood of a vertex in G(n, d/n) looks like a branching process tree. In the first 
step of the proof we bound the relaxation time on a tree generated by a Galton-Watson branching process. 
More generally, we show that trees that are not too dense have polynomial mixing time. 

Definition 2.1 Let T be a finite rooted tree. We define m(T) = maxr X^er dv where the maximum is taken 
over all simple paths V emanating from the root and d v is the degree of node v. 

Theorem 2.2 Let r be the relaxation time of the continuous time Gibbs Sampler on T where < (3 U)V < (3 
for all u and v and given arbitrary boundary conditions and external field. Then 

t < exp(4/?m(T)). 

Proof: 

We proceed by induction on m with a similar argument to the one used in |[T8Tl for a regular tree. Note that if 
m = the claim holds true since r = 1. For the general case, let v be the root of T, and denote its children 
by Mi, . . . , Uk and denote the subtree of the descendants of Ui by T\ Now let T' be the tree obtained by 
removing the k edges from v to the m, let P' be the Ising model on T' and let r' be the relaxation time on 
T' . By equation we have that 

, , maiQ, P(a)/P'(a) 
mm CT)T Q(a, T)/Q'{a,T) 

Now we divide T into k + 1 blocks {{v}, {T 1 }, . . . , {T k }}. Since these blocks are not connected to 
each other the block dynamics is simply the product chain. Each block updates at rate 1 and therefore the 
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relaxation time of the block dynamics is simply 1. By applying Propositio ril .3 1 we get that the relaxation 
time on T' is simply the maximum of the relaxation times on the blocks, 

t' < max{l, t 1 }. 

where r* is the relaxation time on T l . Note that by the definition of m, it follows that the value of m for 
each of the subtrees T % satisfies m{T l ) < m — k, and therefore for all i it holds that r* < exp(4/3(m — k)). 
This then implies by (0 that r < exp(4/3m) as needed. ■ 

2.2 Some properties of Galton Watson Trees 

Here we prove a couple of useful properties for Galton Watson trees that will be used below. We let T be 
the tree generated by a Galton-Watson branching process with offspring distribution N such that for all t, 
Eexp(tN) < oo and such that E(N) = d. Of particular interest to us would be the Poisson distribution 
with mean d which has 

E cxp (tN) = exp(d(e* - 1)). 

We let T r denote the first r levels of T. We let M(r) denote the value of m for T{r) and r(r) the supremum 
of the relaxation times of the continuous time Gibbs Sampler on T(r) over any boundary conditions and 
external fields assuming that = sup/3 Uj „. We denote by Z r the number of descendants at level r. 

Theorem 2.3 Under the assumptions above we have: 

• There exists a positive function c(t) such that for all t and all r: 

E[exp(tM(r))] < exp(c(t)r). 

• Then Er(r) < C {(3) r for some C{(3) < oo depending on (3 = sup/3 Ui „ only. 

• If N is the Poisson distribution with mean d then for all t > 0, 

sup E[exp(tZ r d~ r )] < oo. 

r 

Proof: Let K denote the degree of the root of T r and for 1 < i < K let Mj(r — 1) denote the value of m 
for the sub-tree of T r rooted at the i'th child. Then: 

E[exp(tM(r))} = £[max(l, max exp(t(Mj(r - 1) + K)))] 

l<i<K 

K 

< E[(l + exp(tK))^2exp(tMi(r - 1))] 

i=i 

= E[{1 + K exp(tK))]E[exp(tM(r - 1))]. 
and so the result follows by induction provided that c(t) is large enough so that 

exp(c(t)) > E(l + Kexp(tK)). 

For the second statement of the theorem, note that by the previous theorem we have that 

Er{r) < E[exp(4/3M(r))], 
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where M(r) is the random value of m for the tree T r so if C{j3) = exp(c(4/J)) then Er(r) < C(f3) r . 
For the last part of the theorem, let Ni be independent copies of N and note that 

Zf Zf 

Eexp{tZ r+1 ) = Eexp(J2td- (r+1) Ni) = E[E[exp(J2 td ~ ir+1)Ni \ Zn ^ (8) 

i=0 i=0 

= E[(E[exp(td~ r+1 N)]f r ] = Eexp{log(Eexp(td-^N))Z r ) 

which recursively relates the exponential moments of Z r+ \ to the exponential moments of Z r . In particular 
since all the exponential moments of Z\ exist, E exp(tZ r ) < oo for all t and r. When < s < 1 

°° s^EN^ °° EN"^ 

E exp(sA^) = — ~ } — <l + sd + s 2 Y^ — — < exp(sd(l + as)) (9) 

i=0 Z ' i=2 Z ' 

provided a is sufficiently large. Now fix a t and let t n = t exp(2crf X^£ r +i ^ ')- F° r some sufficiently large 
j we have that exp(2at Si=r+i ^ _l ) < ^ an( ^ < 1 for all r > j. Then for r > j by equations © 

and©, 

Eexp{t r+1 Z r+1 d- (r+1) ) = E exp(log(E exp(t r+1 d-( r+1) N,i))Z r ) 

< £;exp(t r+1 (l + at r+1 d- {r+1) )Z r d- r ) 

< Eexp(t r+1 (l + 2atd' (r+1) )Z r d- r ) 

< E exp(t r Z r d~ r ) 

and so 

supi?exp(tZ T .<i -T ') < sup£'exp(t r Z r (i _r ) = E exp^jZjd -1 ) < oo 
which completes the result. ■ 

When the branching process is super-critical, the number of vertices is 0{{EW) r ) and the result above gives 
that the mixing time is polynomial in the number of vertices on Galton Watson branching process with high 
probability. We remark that all our bounds here are increasing in the degrees of the vertices so if a random 
tree T is stochastically dominated by a Galton-Watson branching process then the same bound applies. 

2.3 Relaxation in Tree-Like Graphs 

For the applications considered for random and sparse graphs, it is not always the case that the neighborhood 
of a vertex is a tree, instead it is sometimes a tree with a small number of edges added. Using standard 
comparison arguments we show that the mixing time of a graph that is a tree with a few edges added is still 
polynomial. We also show that with high probability for the G(n, d/n) the neighborhoods of all vertices are 
tree-like. 

Proposition 2.4 Let G be a graph on r vertices with r + s — 1 edges that has a spanning tree T with 
m(T) = m. Then the mixing time r of the Glauber dynamics on G with any boundary conditions and 
external fields satisfies: 

t < exp(4/3(m + s)). 
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Proof: By equation dVj) removing the s edges in G which are not in T decreases the relaxation time by at 
most a multiplicative factor of exp(4/3s). By Theorem l2.2l the relaxation time of T is at most exp(4/3m) so 
the relaxation time of G is bounded by exp(4/3(m + s)). ■ 

Lemma 2.5 Let G be a random graph distributed as G(n, d/n). The following hold with high probability 
over G: 

• For < a < 2 \o g d there exists some c(a, d) such that for all v 6 G, m(v, a log n) < clog n. 

• There exists k = k(a, d) > such that for all v £ G, t(v,a log n) < k. 

• For < a < 21 * gd an d every v £ G, 

\B{v,a\ogn)\ < 3(1 - d~ l )n alogd log n. 

Proof: We construct a spanning tree T{v , I) of B{v, I) in a standard manner. Take some arbitrary ordering 
of the vertices of G. Start with the vertex v and attach it to all its neighbors in G. Now take the minimal 
vertex in S(v, 1), according to the ordering, and attach it to all its neighbors in G which are not already in 
the graph. Repeat this for each of the vertices in S(v, 1) in increasing order. Repeat this for S(v, 2) and 
continue until S(v, I — 1) which completes T(v, I). By construction this is a spanning tree for B(v, I). The 
construction can be viewed as a breadth first search of B(v,l) starting from v and exploring according to 
our ordering. 

By a standard argument T{v,a\ogn) is stochastically dominated by a Galton-Watson branching process 
with offspring distribution Poisson(d). Then by repeating the argument of Theorem 12.3 1 for some 5, 

E exp(m(T(v,alogn),v,alogn)) < 5 alogn 

and so, 

P(m(T(v, a log n), v, a log n)) > (a<5 + 2) log n) = 0(n~ 2 ). 

which implies that with high probability m(T(v, a log n),v,a log n)) < (a5 + 2) log n for all v. 

If Zi are the number of offspring in generation / of a Galton-Watson branching process with offspring 

distribution Poisson(d) then by Theorem 12.3 1 we have that sup; E ex.p(Zi/d l ) < oo and since 

P(\S(v,l)\ > 3d l logn) < P(exp(Z / /d / ) > n 3 ) < n~ 3 E exp(Zj/d'), 

it follows by a union bound over all v € G and 1 < I < a log n we have with high probability for all v, 

\B(v,alogn)\ < 3(1 - d- 1 )n al ° sd log n. (10) 

In the construction of T(v, a log n) there may be some edges in B(v, a log n) which are not explored and so 
are not in T(v, a log n). Each edge between u,w G V(v, a log n) which is not explored in the construction 
of T(v, a log n) then is present in B(v, a log n) independently of T(v, a log n) with probability d/n. There 
are at most (3(1 — d~~ l )n aXogd \ogn) 2 unexplored edges. Now when k > 1/(1 — 2a log d), 

P(Binomial((3(l - d~ l )n al °^ d \og nf , d/n) > k) = O(n k( - 2alosd - 1 \logn) 2k ) = n' 1 '^ 

so by a union bound with high probability we have t(v, a log n) < k. Now a self-avoiding path in B(v, a log n) 
can traverse each of these k edges at most once so this path can be split into at most k + 1 self-avoiding 
paths in T(v, a log n) and hence with high probability m(v, I) < clog n where c = (k + l)(a<5 + 2). ■ 
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Lemma 2.6 When < a < 2 \o g d w ^ high probability for all v € G, 

\V TsAw{v) (v,alogn)\ < 0{n a ^ d \ogn) 
where V TsAw{v) (v,r) = {«e T SAW {v) : d(u,v) < r}. 

Proof: We now count the number of self-avoiding walks of length at most a log re in B(v, o log re). By 
Lemma [231 we have that with high probability for all v, \B(v,a log rj,) | < 3(1 - d~ 1 )n alo ^ d logn and 
t(v,alogn) < k. Let ei, e t („ alogn ) denote the edges in B(v , a log re) which are not in T(v, a log re). 
Now every vertex in v! € Tsaw{v) corresponds to a unique self avoiding walk in B(v,a\ogn) from v 
to u. A self-avoiding walk in B(v, a log re) passes through each edge at most most once so in particular it 
passes through each of the at most once. So a path which begins at v traverses through some sequence 
, . . . , ej ; in particular directions and then ends at u is otherwise uniquely defined since the intermediate 
steps are paths in T(v, a log re) which are unique. There are at most k(k\) sequences , . . . , , there are 2 h 
choices of directions to travel through them, and at most 3(1 — d _1 )re alog<i logre possible terminal vertices 
in B(v,a log re) so \V TsAw{v) (v,alogn)\ < 3(1 - d~ 1 )2 k k{k\)n aXo ^ d \ogn. 



2.4 Spatial decay of correlation for tree-like neighborhoods 

Proposition 2.7 Let T be a tree such that m(v, a) < m. Then S\(v, a)\ < ( m ~° +1 ) a . 

Proof: First we establish inductively that |5(u,a)| is maximized by a spherically symmetric tree, that is 
one where the degrees of the vertices depend only on their distance to v (it may be that it is also maximized 
by non-spherically symmetric trees). It is clearly true when a = so suppose that it is true for all rre up 
to height a — 1. Let T* be a tree of height a rooted at v that maximizes \S(T*,v, a)\ under the constraint 
rre(T*, v, a) < m and let k be the degree of v. Then each of the subtrees T t attached to v have depth a — 1 
and are constrained to have m(Ti, Vi, a— 1) < m — k — 1. Let be a sphereically symmetric tree of height 
a — 1 which has m(T~ ,v,a — 1) < m — k — 1 and maximizes S\(T~ , v, a — 1) |. A vertex v connected to the 
roots of k copies of T~ is a spherically symmetric tree of height a with m(v, a) = m and by our inductive 
hypothesis must have boundary size \S(v, a) \ at least as large as T* which completes the induction step. 
So suppose that T is sphereically symmetric and let d% be the degree of a vertex distance i from v. Then by 
the arithmetic-geometric inequality 



We now consider the effect that conditioning on the leaves of a tree can have on the marginal distribution 
of the spin at the root. It will be convenient to compare this probability to the Ising model with the same 
interaction strengths (3 UV but no external field (h = 0) which we will denote P. 

Lemma 2.8 If T is a tree, P is the Ising model with arbitrary external field (including h u = ±oo meaning 
that a u is set to ±) and (3 U)V < (3 then for all v , 



a-1 




a-1 



\S(v,a)\ = d Y[( d i 



i=i 



i=0 



P(cr v = +\os(v,l) =+)- P(°~v = +Ws(v,l) 



) < \S{v,l)\(tanhf3y. 
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Proof: Lemma 4.1 of O states that for any vertices v,u G T, 

P(a v = +\a u = +)- P{a v = +\a u = -) < P(a v = +\a u = +) - P(a v = +\a u = -). (11) 
If uo,u%, . . . , ui are a path of vertices in T then a simple calculation yields that 

k 

P(a Uk = +\a uo = +)- P(a Uk = +\a uo = -) = JJtanh/3 Ui _ ltl . < (tanh/3) fc . (12) 

i=l 

Now suppose that u G Z) and that r)g, v « and r/~^ « are configurations on 5(v , Z) which differ only at 
u where rj~ = ±. Conditioning is equivalent to setting an infinite external field so equations (TTTT t and (fl2l 
imply that 

P(ff« = +\(T S (v,l) = V + ) ~ Pi?v = +Ws(v,l) = V~) < (tanh/3)'. (13) 

Take a sequence of configurations ry ,^ 1 ,... , jyPW)! on S(v,l) with rp = — and ^("'Ol = + where 
consecutive configurations differ at a single vertex. By equation (fT"3l) we have that 

P(a v = +\<?S(v,l) = ~ P(ci v = +Ws(v,i) = rf) < (tanh/?)' 

and so 

P(a v = +Ws(v,l) =+)- P(°v = +Ws(v,l) = -)< \S{v,l)\(tanhp) 1 
which completes the proof. ■ 

Now B(v, alogra) is not in general a tree so we use the self-avoiding tree construction of Weitz |[27l to 
reduce the problem to one on a tree. The tree of self-avoiding walks, which we denote T saw (v , a log n), 
is the tree of paths in B(v, alogn) stalling from v and not intersecting themselves, except at the terminal 
vertex of the path. Through this construction each vertex in T saw (v, a log n) can be identified with a vertex 
in G which gives a natural way to relate a subset A C V and a configuration a a to the corresponding subset 
A' C T saw (v, a log n) and configuration a\i in T saw . Furthermore if A, B C V then d(A, B) = d(A', B'). 
Then Theorem 3. 1 of [21] gives the following result. Each vertex (edge) of T saw corresponds to a vertex 
(edge) so Pt sciw is defined by taking the corresponding external field and interactions. 

Lemma 2.9 For a graph G and v G G there exists A C T saw and some configuration ta on A such that, 

Pg{°v = +Wa) = P T S a W { a v = +\<TM,TA-m)- 

The set A corresponds to the terminal vertices of path which returns to a vertex already visited by the path. 
Corollary 2.10 Suppose that a, b, c, (3 satisfy the hypothesis ofTheorem \1.6\ Then, 

maxP(a v = +|<r 5( „ )alogri) = +) - P(a v = +\°~S(v, alogn) = ~) = o(n~ 1 ). 

Proof: By applying Lemma [2791 we have that if A = S(v, a log n) then 

Pg(o- v = +|cr A = +) - P G (cr v = +\cr\ = -) 

= p T saw {o- v = +|o"A' = +,ta-A') - p T saw {o- v = +\°~A-' = ^-AO- 
Conditioning on ta is equivalent to setting the external field at u G A to sign(r 1 ,)oo hence it follows by 
Lemma |278l that, 

p T saw (o- v = +\a A > = +,t A -a>) -PT saw (cr v = +|cta' = ; TA— A') < |5" sau ,(u,alogn)|(tanh/3) alog?1 
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where S saw (v,alogn) = {u G T saw (v, alogn) : d(u,v) = alogn}. Now suppose = ni,«2, . . . ,«fe 
is a non-repeating walk in T saw and let u[ , u' 2 , . . . , u' k be the corresponding walk in G. Then from the 
construction of T saw either , u' 2 , . . . , u' k is a non-repeating walk in G or for some j < k, u'- = u' k in which 
case Uk is a leaf of T saw and so has degree 1. It also follows from the construction of T saw that the degree 
of Ui is less than or equal to the degree of v! i and so we have that m(v, a log n) < m(T saw ,v, a log n) + 1. 
The by Proposition 12.71 

i | rj \ a log n 

■gn-alogn + 2 \ = Q / alog((c _ a)/a) \ = ^-i^^-alogn) 
alogn / V / 

which completes the result. ■ 
2.5 Proof of the Main Result 

Proof: (Theorem [L6l > Let Xf, Xf , denote the Gibbs sampler on G started from respectively all + and — , 
coupled using the monotone coupling described in Section [T.2. II Fix some vertex v £ G. Define four new 
chains Qf, Qf, Zf and Zf . These chains run the Glauber dynamics and are coupled with Xf and Xf 
inside B(v, a log n) by using the same choice of vertices v±, V2, ■ ■ ■ and the same choice of update random 
variables U±, U2, ■ ■ ■ except that they are fixed (i.e. do not update) outside B(v , a log n). They are given the 
following initial and boundary conditions. 

• Qf starts from all + configuration (and therefore has all + boundary conditions during the dynamics). 

• Q~j~ starts from all — configuration (and therefore has all — boundary conditions during the dynamics). 

• Zf starts from all + configuration outside B(v, alogn) and Zq is distributed according to the sta- 
tionary distribution inside B(v, alogn) given the all + boundary condition (therefore Zf will have 
this distribution for all t). 

• Zf starts from all — configuration outside B(v,a log n) and is distributed according to the stationary 
distribution inside B(v, alogn) given the all — boundary condition (therefore Z[~ will have this 
distribution for all t). 

As the Gibbs distribution on B(v, alogn) with a + boundary condition stochstically dominates the distri- 
bution with a — boundary condition, we can initialize Zf and Z t "~ so that Z^ ^ Zq. By monotonicity of 
the updates we have Qf ^ Zf I>= Z t ~ ^ Q^ for all t. We also have that Qf )p Xf fc= Xf )p Q~[ on 
B(v, alogn). As Zf (respectively Zf) starts in the stationary distribution of the Gibbs sampler given the 
all + (respectively all — ) boundary condition, it remains in the stationary distribution for all time t. 
Since Zf{v) > Zf(v) we have that 

P(Zf(v) + Zf(v)) = P(Zf(v) = +)- P(Zf(v) = +)< oin- 1 ), 

for all t where the inequality follows from Corollary 12.101 By Proposition 12.41 the continuous time Gibbs 
sampler on B(v, a log n) has relaxation time bounded above by exp(4/3(6 + c) log n) which implies that the 
discrete time relaxation time satisfies r < n l+i ^ h+c \ As each vertex has degree at most clogn, 

log(min P(<t)) _1 < IB\E\) + V \h u \ < (lOOcn 2 /? 2 logn) 

CT * ' 

U 

which implies that T mix < 0(n 4+4 ( 6+c )^) since the mixing satisifies r m j x < r(l + \ log(min CT P(<r)) 1 ). 
For C = 6 + 4(6 + c)(3 we have that with high probability after t = 2n c steps that the Gibbs sampler has 



\S sa w(v-,alogn)\ < 
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chosen every vertex at least n 5 + 4 ( b + c )/ 3 > nr m i x times. It follows that the number of updates to B(y, a log n) 
is at least n times its mixing time and so 

P(Q+(v) + Z+(v)) < d TV (Q+(v),Z+(v)) < e~ n = o(n^). 

where dry denotes the total variation distance which is always bounded above by exp(— t/T m i X ). We 
similarly have that 

P&W^ZrWKoin- 1 ). 

It follows that P(Qf(v) + Qi{v)) < o(n" 1 ) and hence P(X?(v) + X£ "(«)) < o(n" 1 ) for all v. By a 
union bound P{X^ ^ X^~) < o(l) so the mixing time is bounded by 0(n c ) as required. ■ 

Proof: (Theorem [T3T ) By Lemma 1231 with high probability a random graph satisfies the hypothesis of The- 
orem [L6] for small enough j3. To prove the result when tanh(/3) < the only modification to the proof 
of Theorem 1 1 . 6 1 needed is to show that with high probability when — l/(log(dtanh(/3))) < a < (2 log d)" 1 
we still have P(Z^~(v) ^ Z^{v)) < o(n~ l ). We know from Lemma [2761 that with high probability 

\Vr SAW ( v )(v,alogn)\ < O(n alogd logn) = o(n~ 1 (tanh/3)~ alogn ). Now using this bound and repeat- 
ing the proof of Corollary 12. 101 we get that P(Z^~(v) ^ Zf(v)) = o(n~ 1 ) as required. 

The mixing time is bounded by n 6 + 4 ( fe + c )/ 3 which is bounded by n 6+4 ( b+c ) tanh ^7^) and does not need to 
depend on (3. ■ 

2.6 Sampling from the distribution through the tree of self avoiding walks 

The proofs Theorems ll.l2l and ll.l3l make use the following lemmas. 

Lemma 2.11 Let (X±, . . . , X n ) and (Y\, . . . , Y n ) be two vector valued distributions taking values in some 
product space. Suppose that for all 1 < i < n and all (x±, . . . , X{) we have 

d T v{{Xi\Xi = xi, . . .,Xi-i = x»_i), (3^ |*i = xi, ■ ■ • = Xj_i)) < Si, 

Then 

n 

d TV {{X u . . .,X n ), (y 1( . . . , Y n )) < 

i=l 

Proof: The proof follows by constructing a coupling of the two distributions whose total variation distance 
is bounded by Y17=i £ i- The coupling is performed by first coupling X\ and Y\ except with probability ei. 
Then at step i, given the coupling of (Xi, . . . , and (Y±, . . . , Y^-i) and conditioned on 

(X 1 ,...,X i ^ 1 ) = {Y l ,...,Y i ^ l ), 

we couple the two configurations in such a way that they do not agree at most with probability £.;. The proof 
follows. ■ 

Lemma 2.12 Suppose the graph G satisfies that for v £V it holds that 

\ y T S Aw{v){v,a)\ < b, 

Then for all integer j it holds that 

\ V T SAW (v)(v,ja)\ < V. 
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Proof: We prove the result by induction on j. Suppose that u G St saw ( v )(v, (j — l)a) and let T u denote 
the subtree of u and its descendants in Vr SAW ( v )(v, ja)\Vr SAW ^(v, (j — I) a). Each path from u in T u 
corresponds to a self avoiding walk in G started from u so it follows that the number of vertices in T u \{u} 
is at most b — 1. So \Vr SAW f v )(v,ja)\VT SAW ( v )(v, (j — l)a)j < V~ x {b — 1) which completes the induction. 
■ 

Proof: (Theorem IT. 13b Set r = ja where j is the smallest integer greater than i og ^^n^/3) . By Lemma 12.121 

for alH, | Vt saw r Vi )(vi,r log n)\ < b r log n so TgJ^y 1 (v{ ) the tree of self avoiding walks of radius r log n can 
be constructed in 0(6 rlog n ) = 0(n rlogb ) steps. Using the standard recursions on a tree, pi can be evaluated 
in O(n rlo&b ) steps so the running time of the algorithm is 0{n c ) where C = 1 + r log b. 
At step i we calculate pi = P T ria S n (a Vi = +|ov i _ 1 , TA_i/ i _ 1 ) to approximate P(a Vi = +|oy i _ 1 ). Applying 

SAW 

Lemma 12791 we have that 

P{<r Vi = +kVi_i) = PTsAwivifavi = +l°V l _ 1 ,TA-y l _ 1 )- 
where Vj = . . .,vj} and so if A = S Ts Aw ^(vi,r log n) then, 

P TsAw(v,)( a ^ = + \ aA = -iVV^^TA-V^) < Pfcvi = +|0Vi_i) 

^ PTsAwivifavi = +WA = +,(TVi-i,TA-V i - 1 ) 

and similarly 

PtsawM^V, = + |o"A = -jO-Vi-ilTA-Vi-l) < -Pyrlogn^ = + 1 OVi_i , TA-Vi_i ) 
\ */ -'saw 

^ P T SAW (^)(°"«i = +I°"A = +,cry i _i,TA-V i _i) 

so 

|P T rlogn((T„ l = +|cry TA_ y ) - P( CT =+|oy )| 

-'saw 

- P T Sj4 w( Vi )(^ = + I°"A = +,0'V;_i,TA-V i _ 1 ) -^T SAH ,^)K = +I°"A = -iPVi-nTA-V^i). 
Conditioning on av l _ 1 and ta is equivalent to setting the external field to be ±oo. Then by Lemma |2~8l 

P T SA w(n)( a n = + I°A = +> cr Vi_i,TA-V i _ 1 ) - Ptsaw^)^ = + I°A = -> cr Vi_i,TA-V i _ 1 ) 

< |S TsAH/K) (^,rlogn)|(tanh/3) rlog " = O^" 1 " 7 ). 
If Q is the output of the algorithm then by Lemma l2.11l 

n 

drv{P,Q) < 5^ sup \P T rio S n((T Vi = +\av i _ 1 ,TA-v i - 1 ) ~ P{°v l = = 0(n~ 7 ) 

1=1 

which completes the result. ■ 

Proof: (Theorem [T. 121 ) By Lemma |2~6l equation (O holds with high probability for any < a < 2 \l g d an( * 
b > d so the result follows by Theorem 1 1.131 
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